Learning from Humans as an I-POMDP
نویسندگان
چکیده
The interactive partially observable Markov decision process (I-POMDP) is a recently developed framework which extends the POMDP to the multi-agent setting by including agent models in the state space. This paper argues for formulating the problem of an agent learning interactively from a human teacher as an I-POMDP, where the agent programming to be learned is captured by random variables in the agent’s state space, all signals from the human teacher are treated as observed random variables, and the human teacher, modeled as a distinct agent, is explicitly represented in the agent’s state space. The main benefits of this approach are: i. a principled action selection mechanism, ii. a principled belief update mechanism, iii. support for the most common teacher signals, and iv. the anticipated production of complex beneficial interactions. The proposed formulation, its benefits, and several open questions are presented.
منابع مشابه
Dialogue POMDP components (part I): learning states and observations
The partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitly while being robust to noise. In this context, estimating the dialogue POMDP model components is a significant challenge as they have a direct impact on the optimized dialogue POMDP policy. To achieve such an estimation, we propose meth...
متن کاملDialogue POMDP components (Part II): learning the reward function
The partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitlywhile being robust to noise. In this context, estimating the dialogue POMDP model components (states, observations, and reward) is a significant challenge as they have a direct impact on the optimized dialogue POMDP policy. Learning sta...
متن کاملIntent-aware Multi-agent Reinforcement Learning
This paper proposes an intent-aware multi-agent planning framework as well as a learning algorithm. Under this framework, an agent plans in the goal space to maximize the expected utility. The planning process takes the belief of other agents’ intents into consideration. Instead of formulating the learning problem as a partially observable Markov decision process (POMDP), we propose a simple bu...
متن کاملModeling recursive reasoning by humans using empirically informed interactive POMDPs
Recursive reasoning of the form what do I think that you think that I think (and so on) arises often while acting rationally in multiagent settings. Several multiagent decision-making frameworks such as RMM, I-POMDP and the theory of mind model recursive reasoning as integral to an agent’s rational choice. Real-world application settings for multiagent decision making are often mixed involving ...
متن کاملModel-Based Online Learning of POMDPs
Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free methods — methods that learn a policy without learning a model of the world. When sensor noise increases, model-free methods provide less accurate policies. The model-based approach — learning a POMDP model of the world, and comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1204.0274 شماره
صفحات -
تاریخ انتشار 2012